Conversation
Can be removed.
- Sanitize user-controlled fileName parameter to prevent directory traversal - Add os.path.basename() to remove path components from filename - Validate filename contains no '..' or path separators - Add directory boundary check to ensure resolved path stays within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection - Prevents attacks like '../../../etc/passwd' from accessing system files
- Add path sanitization using os.path.basename() - Validate against directory traversal patterns (.., path separators) - Enforce directory boundary checks within DDA_SYSTEM_FOLDER - Addresses CodeQL rule py/path-injection
…ction - Add sanitization for workflowId and captureId parameters - Validate against directory traversal patterns (.., path separators) - Use sanitized components in jsonl file path construction - Addresses CodeQL rule py/path-injection for line 108
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
The best practice for managing potentially user-controlled file paths is:
- Normalize (canonicalize) the complete path after joining the user input to the intended base directory. This step removes symlinks, redundant path separators, and traversal segments like
... - Then, check after normalization that the resulting path is strictly within the intended parent directory using robust containment logic (not just
startswith, which is error-prone for directories like/tmp/fooand/tmp/foobar).
Edits required:
- In
src/backend/endpoints/download_file.py, lines 216-228:- Remove or replace
os.path.basenameand associated manual checks. - Use
os.path.normpath(and preferablyos.path.abspath) after joining the user-supplied path to the base directory. - Ensure the containment check is robust: on Unix, this is often done with
os.path.commonpath. - Use the normalized check before file access.
- Remove or replace
- Add any imports needed (
osis already imported).
| @@ -212,30 +212,22 @@ | ||
| # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk | ||
| # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk | ||
| if captureIdPath: | ||
| # Sanitize path to prevent directory traversal | ||
| safe_path = os.path.basename(captureIdPath) | ||
| if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path: | ||
| # Normalize path and ensure it is descendant of the allowed root | ||
| candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)) | ||
| abs_candidate_path = os.path.abspath(candidate_path) | ||
| dda_system_folder_abs = os.path.abspath(DDA_SYSTEM_FOLDER) | ||
| # Strict directory containment check using commonpath | ||
| if os.path.commonpath([abs_candidate_path, dda_system_folder_abs]) != dda_system_folder_abs: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Invalid file path" | ||
| detail="Invalid or unsafe file path" | ||
| ) | ||
|
|
||
| # Construct full path within allowed directory | ||
| full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path) | ||
|
|
||
| # Ensure resolved path stays within DDA_SYSTEM_FOLDER | ||
| if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)): | ||
| if not os.path.exists(abs_candidate_path): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail="File not found", | ||
| ) | ||
| with open(full_path) as json_file: | ||
| with open(abs_candidate_path) as json_file: | ||
| inference_result_data_list = json.load(json_file) | ||
| else: | ||
| # Regular case, query the data ourselves |
| detail="File not found", | ||
| ) | ||
| with open(captureIdPath) as json_file: | ||
| with open(full_path) as json_file: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
The optimal fix is to eliminate reliance on os.path.basename and instead normalize the provided path by using os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)). After normalization, check that the resultant path is strictly within DDA_SYSTEM_FOLDER by verifying it starts with the canonical root directory path. The validation should be based on canonical (absolute, normalized) paths, not just the user-provided string or basename.
Update the logic as follows:
- Remove
safe_path = os.path.basename(captureIdPath)and the related check. - Instead, resolve the candidate file path by normalizing with
os.path.normpathand then converting toos.path.abspath. - Check that the resultant path starts with the canonical root directory.
- Leave all other logic intact.
All changes should be made in the get_inference_result_data_for_retraining function in src/backend/endpoints/download_file.py.
| @@ -212,30 +212,21 @@ | ||
| # First, POST to /workflows/{workflow_id}/results/export, which writes out the data to a file on disk | ||
| # Then, GET from /workflows/{workflow_id}/results/export (here), which loads the data from said file on disk | ||
| if captureIdPath: | ||
| # Sanitize path to prevent directory traversal | ||
| safe_path = os.path.basename(captureIdPath) | ||
| if safe_path != captureIdPath or '..' in captureIdPath or os.path.sep in safe_path: | ||
| # Sanitize and validate path to prevent directory traversal | ||
| candidate_path = os.path.normpath(os.path.join(DDA_SYSTEM_FOLDER, captureIdPath)) | ||
| root_folder = os.path.abspath(DDA_SYSTEM_FOLDER) | ||
| resolved_path = os.path.abspath(candidate_path) | ||
| if not resolved_path.startswith(root_folder + os.sep): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Invalid file path" | ||
| detail="Invalid or disallowed file path" | ||
| ) | ||
|
|
||
| # Construct full path within allowed directory | ||
| full_path = os.path.join(DDA_SYSTEM_FOLDER, safe_path) | ||
|
|
||
| # Ensure resolved path stays within DDA_SYSTEM_FOLDER | ||
| if not os.path.abspath(full_path).startswith(os.path.abspath(DDA_SYSTEM_FOLDER)): | ||
| if not os.path.exists(resolved_path): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Access denied" | ||
| ) | ||
|
|
||
| if not os.path.exists(full_path): | ||
| raise HTTPException( | ||
| status_code=HTTP_404_NOT_FOUND, | ||
| detail="File not found", | ||
| ) | ||
| with open(full_path) as json_file: | ||
| with open(resolved_path) as json_file: | ||
| inference_result_data_list = json.load(json_file) | ||
| else: | ||
| # Regular case, query the data ourselves |
Fix Backend security issues